Data sharing in cancer research: A qualitative study exploring community members' preferences

Abstract Background Advancements in cancer treatment and survivorship rely on participation in research and access to health records. Methods This study explored preferences for data access and sharing in 14 workshops with 42 community members, most of whom were a cancer survivor or carer. Various scenarios for data access and sharing were presented and discussed, with participants' preferences summarized using descriptive statistics. Reasons underlying these preferences were identified through a thematic analysis of workshop transcripts. Results Most participants indicated a willingness for researchers to use their self‐report data and current health records for a specific research project (86%). Many were also willing for their self‐report data and current (62%) or all future (44%) health records to be shared with other researchers for use in other studies if made aware of this. Willingness to consent to data access and sharing data in cancer research was influenced by: (i) the potential for data sharing to advance medical discoveries and benefit people impacted by cancer in the future, (ii) transparency around researchers' credibility and their intentions for data sharing, (iii) level of ownership and control over data sharing, and (iv) protocols for privacy and confidentiality in data sharing. Conclusions Based on these themes, we present practical strategies for optimizing data access and sharing in cancer research.


| INTRODUCTION
With cancer a leading cause of mortality worldwide, 1 there is an ongoing need for research-driven innovations in cancer treatment and survivorship. 2Achieving this requires researchers to collect, store, and analyze a wide range of data, including demographic, health, psychosocial, behavioral, and genetic information. 3Collecting these data from a large, representative samples in target populations is important for generating valid research outputs that can inform future interventions. 4Further, using these data in other research projects or sharing with other researchers reduces participant burden, prevents duplication of work, and supports mutual resource use. 5 Thus, there is a need to not only optimize community participation in cancer research but also facilitate efficient data collection and sharing.
The collection, use, storage, and sharing of human research data is tightly regulated through legislative and institutional policies and guidelines to protect the privacy and well-being of research participants.For example, in Australia, researchers are required to adhere to the principles and guidelines outlined in the National Statement on Ethical Conduct in Human Research in order to receive ethical approval and funding from research institutions and funding bodies. 6n most instances, these guidelines require researchers to disclose information about the purpose of the research to prospective participants, including what their participation will involve, and obtain their voluntary consent to participate, which if feasible, may be withdrawn at any stage. 6Ethical approval also requires that personal information is stored securely and kept confidential. 6Such regulation serves to protect individuals with respect to the use of their data but can be perceived as a barrier to processes such as data sharing and linkage. 7hen consumers are considering taking part in medical research, review evidence indicates that a key determinant of research participation is how their data will be collected, stored, and accessed. 80][11] However, little is known about the principles that underlie consumers' preferences for how their data are used in cancer research and how studies could be designed to optimize data collection and sharing.This paper explores community members' preferences for use of their self-report data and health records in cancer research.It also examines the reasons underlying consumers' preferences for data access and sharing in cancer research and provides practical recommendations for designing and conducting cancer research studies.Understanding and applying consumers' preferences for data access and sharing will support research integrity and may optimize community participation in cancer research.

| Context and setting
Data were collected for this study during workshops held to codesign a population-based cancer survivorship study in Queensland, Australia, with methods described elsewhere. 12Briefly, co-design research actively involves consumers in developing, designing, implementing, and evaluating new products or services. 13onsumer involvement in research can contribute to better study outcomes, including higher enrolment and retention rates. 14The current study adopted a pragmatic approach to codesign with community members participating in workshops to design study invitation materials and a qualitative survey tool for collecting detailed information on the supportive care needs and experiences of people affected by cancer. 12The session activities and visual materials used in the workshops were developed by the broader study team, including cancer survivorship researchers, medical oncologists, and executive and senior leaders in a cancer support organization and a population-based registry.This paper reports on findings from the final activity in the workshop that was completed in 14 of the 15 workshops (n = 42 community members from a total sample of 44) (see Figure 1).Ethical approval for the study was obtained from the University of Southern Queensland Human Research Ethics Committee (ref: ETH2023-0140).

| Participants and recruitment
Participant recruitment was undertaken from October to December 2023 using digital and printed flyers distributed via networks associated with Cancer Council Queensland or the broader research team.To support the recruitment of priority populations, such as culturally and/or linguistically diverse (CALD) groups, the research team submitted study information to a health consumers network for inclusion in their e-newsletter. 15s data collection advanced, recruitment was supplemented through snowball sampling, with workshop participants invited to share recruitment flyers with friends, family members, and colleagues.Community members were eligible to participate in the codesign workshops if they were aged 18 years or older, English-speaking, and residing in Queensland, Australia.Participants included people with a personal experience of cancer (either as someone diagnosed with cancer or a carer for someone with cancer), as well as community members who did not have a personal experience of cancer.The latter group was included as current cancer incidence rates means it is likely that everyone will be impacted by cancer at some stage in their life, either as a patient or carer. 16articipants were allocated to a workshop based on their nominated availability, with no more than five registered participants per workshop.Recruitment continued until a diverse sample had been achieved and the research questions had been adequately explored, determined by the authors through concurrent data collection and analysis.Due to the large number of online registrations for the interviews, participants were purposively sampled based on their demographic characteristics, including gender, ethnicity, Indigenous status, and geographical location, as well as their experience with cancer (i.e., survivor or carer) to ensure that diverse perspectives were represented.

| Data collection
Workshops were facilitated by two female researchers (SA, XB, and/or EJ) with undergraduate or postgraduate degrees in health science fields and training in qualitative data collection.The facilitators had no prior relationship with workshop participants.At the start of each workshop, the facilitators introduced themselves, including their role in the research team and their academic background.Workshops were conducted as either online (n = 9), in-person (n = 1), or hybrid (i.e., online and in-person) (n = 4) sessions using Microsoft Teams.In-person participants attended the session at one of two not-for-profit organizations, where participants were provided with the relevant materials (e.g., pen, paper).Participants attending online were asked to source these materials themselves.Workshops were audio-recorded and transcribed using Microsoft Teams.After completing the workshop, participants received a voucher valued at AU$100.00 for their time (approx.120 min).Data for the current study were drawn from the final activity of the workshops, when participants were given a series of hypothetical scenarios developed by the research team to prompt discussion about preferences for data collection and sharing in cancer research (Table 1).Participants were advised that in the scenarios, 'self-report data' refers to information about themselves that they provide to researchers (e.g., needs and experiences), while 'health records' refers to information about themselves that researchers gather from registries or medical documents (e.g., cancer type and stage).The facilitator then provided a short explanation for each scenario (see italics in Table 1).For scenarios that involved data sharing, participants were informed that their information would be kept confidential.The lead facilitator asked each participant to verbally indicate which scenario (i.e., level of data sharing) they felt comfortable with and discuss the reasoning behind their choice.This often resulted in further discussion among group members around data sharing in cancer research.
F I G U R E 1 Flowchart of participant recruitment and selection for the codesign workshops.a Invalid/ineligible responses to the workshop invitation were identified based on a combination of factors (e.g., duplicate IP addresses with different names, invalid postcodes or phone numbers, replicated responses in a short period of time, and unusual completion times have been indicative of phishing attempts).Where validity could not be determined from the response, it was investigated further through phone and/ or email contact with the respondent.

| Data analysis
Descriptive statistics were used to summarize participant characteristics and preferences for data access and sharing based on the scenarios presented.Workshop transcripts were analyzed using codebook thematic analysis to identify recurring patterns in the data.As described by Braun and Clarke, codebook thematic analysis is a structured approach to coding that conceptualizes themes as topic summaries of a central concept and is distinct from their reflexive approach to thematic analysis. 17First, two members of the research team (XB, EJ) reviewed workshop transcripts to familiarize themselves with the data.Second, transcripts were coded inductively by one researcher (XB) based on the words used by participants to describe their data sharing preferences and reasons underlying their preferences.These codes were documented in a coding framework alongside representative participant quotes.Third, initial themes were generated by grouping codes together that addressed the same concept (e.g., the need for privacy and confidentiality).Within each theme, initial sub-themes were developed to capture the various perspectives and nuances expressed by participants for each concept.Fourth, themes and sub-themes were further developed through discussion with a second researcher (EJ) and documented using an audit trail.Finally, each theme was assigned a title to summarize the key concept that it represented.

| RESULTS
The characteristics of the 42 workshop participants are shown in composition).Participants had a median age of 43 years (range 23-79 years), and 31 (74%) were female.Thirtyfour (81%) people had a personal experience of cancer, either as someone diagnosed with cancer (n = 18; 43%) or as a carer of someone diagnosed with cancer (n = 16; 38%).Eleven (26%) people were living in a regional or remote area, and 19 (45%) lived in an area of low to medium socioeconomic status.Nine (21%) people were born overseas, 4 (10%) spoke a language other than English at home, and 3 (7%) identified as Aboriginal and/or Torres Strait Islander.

| Preferences for data access and sharing in cancer research
Preferences for data access and sharing in cancer research are shown in Table 1.All participants agreed with the baseline scenario of providing self-report data directly to researchers for a specific research project, as they had done for the workshops reported in this paper.
In addition to providing self-report data, 36 (86%) would be willing to grant researchers access to their current health records for a specific research project.Twenty-six (62%) would be willing for their de-identified data and current health records from the original research project to be shared with other researchers for other projects if they were made aware of them.Less than half (n = 18; 44%) would be willing for their de-identified future health records to be shared with other researchers for other projects if they were made aware of them.Of the 26 people who would be willing for their self-report data to be shared with other researchers for other projects, five participants (19%) felt they would not need to be made aware of how their information was being used (i.e., receive information about the other research projects and other researchers).These participants included three cancer survivors, one cancer carer, and one person with no personal history of cancer.

| Reasons underlying preferences for data access and sharing in cancer research
Four themes were identified, representing key principles that underpinned participants' willingness to share their self-report data and health records in cancer research.These themes were: (i) the potential for data sharing to advance cancer research and benefit people impacted by cancer in the future, (ii) transparency around researchers' credibility and their intentions for data sharing, (iii) level of ownership and control over data sharing, and (iv) protocols for privacy and confidentiality (Table 3).Personal beliefs regarding the value of research and importance of data encouraged data sharing: [Data] is important, and it could help shape the way cancer is treated in the future if [researchers] have more data.The more data [researchers] have, the more knowledge they have about the situation.
(P7, Workshop M, no personal experience of cancer) For some participants, their experience of cancer changed their perspective on data sharing.Having gained a better understanding of the importance of cancer research they indicated they would be more willing to share their personal information: I probably used to be on the other end of the scale and didn't really share much, particularly online.This cancer experience sort of flipped that for me.That's my 'why'… I would probably say yes to scenario 4 [future health records shared with other researchers for other projects].I know based on my own T A B L E 3 Themes and sub-themes identified in a thematic analysis of community members' preferences for data access and sharing in cancer research and reasons underlying those preferences.

Sub-themes
Potential for data sharing to advance cancer research and benefit people impacted by cancer in the future • Desire to contribute to new research discoveries and help other people impacted by cancer.
• Personal beliefs regarding the value of research and importance of data.
• New perspective on data sharing after cancer experience.
• Conflict between personal preferences for limited data sharing and contributing to "the greater good." Transparency around researchers' credibility and their intentions for data sharing • Need for transparency around researchers' intentions for data sharing prior to providing consent.
• Concerns that data could be sold to third parties or used for commercial purposes.
• Lower need for transparency around who is using the data if data sharing is anonymous and/or used to advance medical research.

Level of ownership and control over data sharing
• Need to retain ownership and control over data, particularly health records.
• Unclear why full health record would be relevant to cancer research.
• Hesitant to consent to researchers accessing future health record due to uncertainty and potential risk.• Limited need for ownership and control over what data is shared if data sharing is de-identified, opportunity to opt-out in the future, and/or used to benefit others or advance medical research.• Preference for data sharing influenced by previous experiences of data sharing in research studies.

Protocols for privacy and confidentiality
• Personal information needs to be de-identified when shared and not potentially reidentifiable.
• Concerns about privacy due to recent data breaches in the commercial space.
• Less concerned about privacy due to the volume of personal information publicly available online.
• Clear communication about protocols for privacy and confidentiality could encourage data sharing.
experience with my son, the drug that saved him has never been used before (that we are aware of) in a child as young as him or his type of cancer.(P35, Workshop C, carer) Some participants expressed a conflict between their personal preference for limited data sharing and being more open to data sharing for "the greater good": Participants expressed concerns that their data could be sold to third parties or used for commercial purposes rather than "public good."Participants indicated that the level of transparency around researchers' credibility and intentions would influence their willingness to share data as part of cancer research: I would need to be aware of who [the researchers] are.It could change my mind… Some people come to the table with terrific credibility.Others you may not know about.[You may be giving] the chance for someone to sell your information because we all know that happens.[I would] probably need to be aware of who they are, rather than just 'researchers', for me to be comfortable with that.
(P4, Workshop O, carer) There are a number of researchers that are looking to privatize their ideas.Some participants were unclear why all the information in their health record would be relevant to cancer research.This limited their willingness to share this data, specifically records not related to their cancer diagnosis: I've opted out of my health record because I don't feel it is anybody's business knowing my vaccination status or any of my past history.
(P25, Workshop G, personal cancer diagnosis) If it was going to cancer research, the only records that would be relevant in my mind is the cancer-related records.To throw in a lung condition … I think it would be more confusing.Not my full records.A lot of it wouldn't be relevant at all.(P18, Workshop I, personal cancer diagnosis/ carer) Other participants were hesitant to grant researchers access to their future health information, mostly due to uncertainty around their future situation and the risk of personal and sensitive information being shared: Some participants expressed a limited need for ownership and control over their data if other conditions were met.These included de-identification, opportunity to optout, and the data being used "for good": If it is anonymous and being used to help or prevent something in the future, I personally don't see anything wrong with it.
(P41, Workshop A, carer) I'm pretty liberal with data sharing as long as there is a way to opt out should I need to.(P9, Workshop L, carer) If my data can help someone in the future not suffer as much or lead to a cure or better treatment, something that improves outlook for future patients, I'd be all for that.(P16, Workshop J, personal cancer diagnosis) Other participants indicated that their preference for data sharing was influenced by the degree of data sharing that they have previously consented to as part of a research study: My health records have already been out in the world.I have been in clinical trials to help future research for cancer.So, whatever I can do to help.
(P24, Workshop G, personal cancer diagnosis) I've participated in another study where I took medications and I consented to my medical records [being used] for that study only… that is okay, definitely not just for any project.(P19, Workshop I, carer) 3.6 | Theme 4: Protocols for privacy and

confidentiality of data sharing in cancer research
This theme addressed participants' desire to know how their data would be shared with researchers and how it would be stored, protected and reported.Privacy and confidentiality were important to most participants, including their personal information being de-identified when shared and not potentially reidentifiable: I wouldn't want my name and date of birth and address and [phone] number being shared with everyone.If I'm just an entity in the system, a '40-year-old male that had this' and 'this was my treatment', I'm totally fine.
(P16, Workshop J, personal cancer diagnosis) As long as it [indicates] that my future health records will be shared in a non-identifiable manner… it has to be non-identifiable.(P33, Workshop D, carer) Some participants expressed concerns about privacy due to recent data breaches in the commercial space, emphasizing the importance of ensuring secure data storage and sharing practices: Is there any way the data can get out to the public?[This is] the one thing I need to ask in light of certain companies having problems with security or going down, like [

| DISCUSSION
This study provides important information on the types of data that community members may be willing to share in cancer research with whom and for what purpose.In general, most community members are willing to share their self-report data and health records in a cancer research study, and many are willing to provide consent for this information to be shared with other researchers for other studies that they are made aware of.For example, community members were happy to consent to receiving information about future studies where their data could be used as they arose and to decide on data sharing on a case-by-case basis.In general, community members were less willing to share their future health records due to uncertainty around their future health and the potential for sensitive information to be disclosed.As this is a small qualitative study, we were unable to further explore preferences for data access and sharing in cancer research based on participant characteristics.While we identified four themes underlying community members' willingness to share their data in cancer research, the sub-themes presented within each theme reveal that these themes are intertwined and should be viewed together.For instance, if data sharing was anonymous and the research findings would benefit other people impacted by cancer in the future, then community members indicated a lower need for transparency around who was using their data and a lower need for ownership and control over the projects their data were used in.A reoccurring narrative throughout the themes was that clear communication around the potential benefits of the research, what data would be shared with whom and how this would be undertaken, and the protocols in place to ensure anonymity could encourage community members to consent to data sharing in cancer research.This highlights the importance of designing study invitation materials with community members to optimize the readability, relevance, and acceptability of these materials. 12he findings presented in this paper expand on previous studies investigating data sharing preferences among people diagnosed with cancer.0][11] The current study demonstrates the applicability of these findings in the Australian setting, not only with cancer survivors but also people caring for someone with cancer.A novel finding of this study is that the potential for research to benefit people affected by cancer in the future is a key motivation for consenting to data sharing.However, willingness to share data is often contingent on a combination of factors, such as the credibility of the research team and assurance of data anonymity, particularly in the context of increased cybersecurity concerns and breaches.By understanding these nuanced preferences, researchers can better design study protocols that align with the values and expectations of the community, thereby enhancing participation in cancer research.
Progressing data sharing in cancer research will require broad engagement with a range of stakeholders.In Australia, while the current data landscape includes many population-based datasets that are relevant to cancer research, there are several barriers to data sharing. 18These include siloed datasets with data custodians bound by restrictive legislation and approval processes for data access that can take years to negotiate and execute. 18These challenges are long-standing, with a systematic review published 10 years ago reporting 20 potential barriers to data sharing in public health across technical, motivational, economic, political, legal, and ethical spheres. 19hile addressing these barriers requires extensive consultation with and commitment from stakeholders and data custodians, findings from our study demonstrate community support for greater data sharing in cancer research and contribute to community engagement efforts to establish protocols for advancing data sharing in cancer research.
Based on our findings, there are several steps that researchers can take to support data sharing in cancer research, despite the system-level barriers.For example, researchers can emphasize on study recruitment materials or during debrief sessions how participants' involvement in the research, including sharing data with other researchers for other projects, may benefit people impacted by cancer in the future and contribute to new medical discoveries, and how their anonymity will be maintained in data sharing and reporting.Researchers can also provide prospective participants with links to information about the research team and examples of how their previous work has been implemented in practice.If offering participants the option to consent to their data being used by other researchers for other projects, assure community members that they will be provided with information about each project and the researchers involved, and will have the to confirm consent or opt-out of sharing their data.Finally, if asking community members to consent to their full health record being used for a cancer research study, provide information on why non-cancer related records are relevant to the proposed research.

| Strengths and limitations
This study included a diverse sample of community members, including people living in rural areas and those with English as a second language.A key limitation of this study is that participants were community members who had already consented to providing self-report data for a cancer research study.Therefore, findings from this study may not represent the views of community members in general, particularly those who prefer to not engage in research and those who experience geographical, technological, or language barriers to research participation; groups that are hard to reach but likely important to include in population-based cancer research studies.A large proportion of the workshop participants were female, however, there is limited evidence to suggest that gender influences data sharing preferences for health research in general. 20,21While the themes identified in this study were discussed by participants with and without a personal experience of cancer, the preferences for data access and sharing reported by the small number of people with no personal experience of cancer could change if they were to be diagnosed, or care for someone, with cancer. 22roupthink is a commonly cited limitation of groupbased qualitative research, where members of a group seek cohesion and conformity in decision-making rather than diversity and different perspectives. 23While our workshops included group discussion around reasons underlying preferences for data access and sharing, the preferences reported by participants in nearly all the workshops were varied (see Table S1).Some participants did indicate that the group discussion challenged their own perspective on data sharing, however, few participants changed their preferred scenario after participating in the group discussion.

Scenarios used to prompt discussion about use of data in cancer research a Total N = 42 Personal experience of cancer No personal experience of cancer N = 8 Diagnosis N = 18 Carer N = 16
3. I am okay with #2 + my self-report data and my current health records being shared with other researchers for other projects if I am made aware of them.Your information would be kept confidential, but your information could be used to provide data for other projects that you are made aware of.4. I am okay with #3 + future health records being shared with other researchers for other projects if I am made aware of them.Similar to the previous option, but you are providing consent for ongoing collection of information about you from your health record that you will be made aware of.a Scenarios are not mutually exclusive and build on each other (i.e., participants who selected Scenario 3 were also comfortable with Scenarios 1 and 2).T A B L E 1 Preferences for data access and sharing in cancer research among community members participating in workshops to codesign a population-based cancer research study.
Characteristics of participants in the 14 codesign workshops (n = 42).
T A B L E 2Abbreviations: ARIA, Accessibility/Remoteness Index of Australia; SEIFA, Socio-Economic Indexes for Areas.a Number of participants and percentage of sample unless otherwise stated.b Higher scores indicate higher relative socioeconomic advantage and lower relative socioeconomic disadvantage in general.High = deciles 7-10, medium = deciles 4-6, low = deciles 1-3.

3.5 | Theme 3: Level of ownership and control over data sharing in cancer research
guess I have some uneasiness around future health records because I don't yet know what my future health issues will be.If it was something around fertility, that might be something quite private, something that is